# %load_ext pretty_jupyter
import sqlite3
import pandas as pd
import cpi
from itertools import product
from IPython.display import display_html
import statsmodels.formula.api as smf
def custom_formatter(value):
''' Helper function for formatting numbers in tables '''
return '{:,.1%}'.format(value) if type(value) != str else value
# cpi.update()
cpi_data = {year: cpi.inflate(1, year) for year in range(1970, 2023)}
def inflate(money):
''' Helper function because cpi function is slow '''
return money*pd.Series(data=money.index, index=money.index).map(cpi_data)
# Database connection for subsetting data below
conn = sqlite3.connect('ccd_db/state/data/state.db')
cursor = conn.cursor()
# Helper mapping for making table displays prettier/more readable.
state_names = pd.read_html('https://en.wikipedia.org/wiki/Federal_Information_Processing_Standard_state_code')[0][['Name', 'Alpha code']]
state_name_mapping = state_names.set_index('Alpha code')['Name'].to_dict()
0. Skills/Tools¶
- Data cleaning/python scripting. Packages:
pandas,requests,sqlite3 - SQL database creation and querying. Engine: sqlite3
- Data visualization. Software: Tableau
- Statistical modeling
1. Introduction¶
I've written two reports (here and here) analyzing academic achievement and education spending data for the state of Utah. I found that there was essentially no evidence of changes in spending affecting academic achievement and became curious if this generalized to the entire US. TL;DR: There is a very, very weak positive relationship between per-student, inflation-adjusted spending and education efficacy. Two academic subjects were examined (math and reading) and only math exhibited this relationship.
Data Used:¶
There were two primary datasets that were utilized.
- Per-student spending data was calculated from the Common Core of Data, published by the National Center for Education Statistics.
- Education efficacy data was obtained from the National Assessment for Educational Progress (NAEP), commonly referred to as the Nations Report Card.
NAEP administers a variety of tests at different grade levels, subjects, and other parameters but to simplify analysis, only 8th grade results in math and reading were used. In some of the visualizations of the NAEP data below, there is a distinction between NAEP scores "with accommodations" and scores "without accommodations". In 1996, NAEP began to introduce accommodations for students with disabilities and English learners and these results were reported alongside those without accommodations. In 2002, testing without accommodations was stopped and only results with accommodations were reported thereafter. For more information, see here
All this data was downloaded, cleaned, organized, and compiled into a SQL database where is queried in this report for analysis.
2. Academic Trends¶
Each state has it's own public education journey but there are some general trends that can be easily observed. NAEP math scores in almost all states were going up until approximately 2013. However, after 2013 math scores began to decline. NAEP reading scores were largely flat until 2013 after which scores began to decline. The declines have been so great that the median NAEP math score (273) is now were it was in 2000, erasing years of gains. And the median NAEP reading score (257) is the lowest since the reading tests were first administered in 1998. Visualized here:
# Query obtaining math and reading scores.
cursor.execute('''
SELECT
end_year,
jurisdiction,
math_read as subject,
accommodations,
mean as avg_naep
FROM
naep
WHERE
grade = 8
AND jurisdiction NOT LIKE 'X%'
AND jurisdiction NOT IN ('NT', 'NP', 'NR', 'AS', 'GU', 'VI', 'YA',
'PR', 'DS', 'NL', 'DC')
;
''')
naep = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
naep.to_csv('naep.csv', index=False)
%%html
<div class='tableauPlaceholder' id='viz1743784225298' style='position: relative'><noscript><a href='#'><img alt='NAEP Scores over Time by State8th Grade. R2 median in green. R3 median in red. ' src='https://public.tableau.com/static/images/NA/NAEPScoresoverTimebyState/NAEPScoresoverTimebyState/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='NAEPScoresoverTimebyState/NAEPScoresoverTimebyState' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/NA/NAEPScoresoverTimebyState/NAEPScoresoverTimebyState/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1743784225298'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
<br>
2003 was the first year that NAEP tests with accommodations were administered to all states in reading and math. If we look at the percentage change in NAEP scores by state, we begin to see that in reading and math only 3 states for each subject saw improvements of over 1% in either test. So top performers over this period are more cases of "not failing" rather than improving.
Top/Bottom 5 % Growth in NAEP Math Since 2003 (with accommodations)¶
# Query obtaining top/bottom states by % change in NAEP math scores.
cursor.execute('''
SELECT
end_year,
jurisdiction,
mean as avg_naep
FROM
naep
WHERE
math_read = 'MAT'
AND accommodations = 'R3'
AND grade = 8
AND jurisdiction NOT LIKE 'X%'
AND jurisdiction NOT IN ('NT', 'NP', 'NR', 'AS', 'GU', 'VI', 'YA',
'PR', 'DS', 'NL', 'DC')
;
''')
naep_math = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
# Calculate change over time span that we have R3, grade 8 scores (since 2003)
naep_math_pivot = naep_math.pivot(columns=['end_year'],
index=['jurisdiction'],
values='avg_naep')
naep_math_change = (
naep_math_pivot[2024]
.div(naep_math_pivot[2003])
.subtract(1)
.sort_values(ascending=False)
.rename('Percent Change')
)
naep_math_top = pd.DataFrame(naep_math_change.head()).reset_index()
naep_math_bottom = pd.DataFrame(naep_math_change.tail()).reset_index()
# Formatting for table display
top_math_styler = (
naep_math_top
.replace({'jurisdiction': state_name_mapping})
.rename(columns={'jurisdiction': 'State'})
.style
.set_table_attributes("style='display:inline; width: 100%;'")
.set_caption('Top 5 - % Growth in NAEP Math')
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
bottom_math_styler = (
naep_math_bottom
.replace({'jurisdiction': state_name_mapping})
.rename(columns={'jurisdiction': 'State'})
.style
.set_table_attributes("style='display:inline;'")
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.set_caption('Bottom 5 - % Growth in NAEP Math')
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
display_html(top_math_styler._repr_html_()+10*' '+bottom_math_styler._repr_html_()+'<br>', raw=True)
| State | Percent Change |
|---|---|
| Mississippi | 3.1% |
| Tennessee | 2.9% |
| Hawaii | 1.6% |
| California | 0.6% |
| Utah | 0.4% |
| State | Percent Change |
|---|---|
| Kansas | -3.5% |
| West Virginia | -3.7% |
| Oregon | -4.6% |
| Delaware | -5.1% |
| Alaska | -5.4% |
Top/Bottom 5 % Growth in NAEP Reading Since 2003 (with accommodations)¶
# Query obtaining top/bottom states by % change in NAEP reading scores.
cursor.execute('''
SELECT
end_year,
jurisdiction,
mean as avg_naep
FROM
naep
WHERE
math_read = 'RED'
AND accommodations = 'R3'
AND grade = 8
AND jurisdiction NOT LIKE 'X%'
AND jurisdiction NOT IN ('NT', 'NP', 'NR', 'AS', 'GU', 'VI', 'YA',
'PR', 'DS', 'NL', 'DC')
;
''')
naep_read = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
# Calculate change over time span that we have R3, grade 8 scores (since 2003)
naep_read_pivot = naep_read.pivot(columns=['end_year'],
index=['jurisdiction'],
values='avg_naep')
naep_read_change = (
naep_read_pivot[2024]
.div(naep_read_pivot[2003])
.subtract(1)
.sort_values(ascending=False)
.rename('Percent Change')
)
naep_read_top = pd.DataFrame(naep_read_change.head()).reset_index()
naep_read_bottom = pd.DataFrame(naep_read_change.tail()).reset_index()
# Formatting for table display
top_reading_styler = (
naep_read_top
.replace({'jurisdiction': state_name_mapping})
.rename(columns={'jurisdiction': 'State'})
.style
.set_table_attributes("style='display:inline'")
.set_caption('Top 5 - % Growth in NAEP Reading')
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
bottom_reading_styler = (
naep_read_bottom
.replace({'jurisdiction': state_name_mapping})
.rename(columns={'jurisdiction': 'State'})
.style
.set_table_attributes("style='display:inline'")
.set_caption('Bottom 5 - % Growth in NAEP Reading')
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
display_html(top_reading_styler._repr_html_()+10*' '+bottom_reading_styler._repr_html_(), raw=True)
| State | Percent Change |
|---|---|
| Hawaii | 2.4% |
| California | 1.3% |
| Louisiana | 1.2% |
| Georgia | 0.6% |
| Nevada | 0.4% |
| State | Percent Change |
|---|---|
| West Virginia | -4.7% |
| Vermont | -4.9% |
| Oklahoma | -4.9% |
| Maine | -4.9% |
| Delaware | -5.7% |
3. Spending Trends¶
Similar to the academic trends observed above, spending trends are roughly the same across states. We can see in the graph below that almost all states saw an increase in the amount of per student spending from 1987 to 2009, a decline from 2009 to 2013, and a subsequent continuation of the increase in spending.
# Query obtaining spending per student per state.
cursor.execute('''
SELECT
fiscal.end_year,
fiscal.te11 / membership.student_count as exp_per_stu,
fiscal.stabr
FROM
fiscal
INNER JOIN
membership
ON fiscal.end_year = membership.end_year
AND fiscal.fipst = membership.fipst
WHERE
membership.total_indicator = 'Derived - Education Unit Total minus Adult Education Count'
AND fiscal.stabr NOT IN ('GU', 'VI','AS', 'PR', 'DC', 'MP')
;
''')
fiscal = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
# Adjust spending for inflation
fiscal['exp_ia'] = fiscal['exp_per_stu'] * fiscal['end_year'].map(cpi_data)
fiscal = fiscal.drop(columns=['exp_per_stu'])
# Print fiscal table to file for distribution of spending over time line chart viz.
# fiscal.to_csv('fiscal.csv', index=False)
%%html
<div class='tableauPlaceholder' id='viz1743461013621' style='position: relative'><noscript><a href='#'><img alt='Spending per Student over TimeMedian in red ' src='https://public.tableau.com/static/images/Sp/SpendingperStudentoverTimebyState/SpendingperStudentoverTime/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='SpendingperStudentoverTimebyState/SpendingperStudentoverTime' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/Sp/SpendingperStudentoverTimebyState/SpendingperStudentoverTime/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1743461013621'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
<br>
I would like to note that these spending figures are inflation adjusted to 2025 dollars and that these are per-student spending amounts. In light of this, I think it's startling just how great the increases in spending are.
Top/Bottom 5 in % Change in Spending Since 1987¶
# Table showing top and bottom 5 % change since 1987
fiscal_pivot = fiscal.pivot(columns=['end_year'], index=['stabr'], values='exp_ia')
fiscal_change = pd.DataFrame({
'Since 1987': (fiscal_pivot[2022].div(fiscal_pivot[1987]) - 1),
'Since 2003': (fiscal_pivot[2022].div(fiscal_pivot[2003]) - 1),
})
fiscal_1987_change = (
fiscal_change
.sort_values(by='Since 1987', ascending=False)
.reset_index()
.loc[:, ['stabr', 'Since 1987']]
)
fiscal_1987_top = fiscal_1987_change.head()
fiscal_1987_bottom = fiscal_1987_change.tail()
# Formatting for table display
fiscal_1987_top_styler = (
fiscal_1987_top
.replace({'stabr': state_name_mapping})
.rename(columns={'stabr': 'State', 'Since 1987': 'Percent Change'})
.style
.set_table_attributes("style='display:inline'")
.set_caption('Top 5 - % Change in Per Student Spending')
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
fiscal_1987_bottom_styler = (
fiscal_1987_bottom
.replace({'stabr': state_name_mapping})
.rename(columns={'stabr': 'State', 'Since 1987': 'Percent Change'})
.style
.set_table_attributes("style='display:inline'")
.set_caption('Bottom 5 - % Change in Per Student Spending')
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
display_html(fiscal_1987_top_styler._repr_html_()+' '+' '+' '+fiscal_1987_bottom_styler._repr_html_(), raw=True)
| State | Percent Change |
|---|---|
| Vermont | 146.9% |
| Kentucky | 136.3% |
| Illinois | 133.0% |
| New Hampshire | 125.6% |
| North Dakota | 122.8% |
| State | Percent Change |
|---|---|
| Montana | 56.3% |
| Nevada | 54.7% |
| Florida | 44.7% |
| Arizona | 37.3% |
| Alaska | 19.3% |
Top/Bottom 5 in % Change in Spending Since 2003¶
Since the NAEP scores for NAEP tests with accommodations only exist from 2003 and after, here are the same tables but calculated since 2003:
# Table showing top and bottom 5 % change since 2003
fiscal_2003_change = (
fiscal_change
.sort_values(by='Since 2003', ascending=False)
.reset_index()
.loc[:, ['stabr', 'Since 2003']]
)
fiscal_2003_top = fiscal_2003_change.head()
fiscal_2003_bottom = fiscal_2003_change.tail()
# Formatting for table display
fiscal_2003_top_styler = (
fiscal_2003_top
.replace({'stabr': state_name_mapping})
.rename(columns={'stabr': 'State', 'Since 2003': 'Percent Change'})
.style
.set_table_attributes("style='display:inline'")
.set_caption('Top 5 - % Change in Per Student Spending')
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
fiscal_2003_bottom_styler = (
fiscal_2003_bottom
.replace({'stabr': state_name_mapping})
.rename(columns={'stabr': 'State', 'Since 2003': 'Percent Change'})
.style
.set_table_attributes("style='display:inline'")
.set_caption('Bottom 5 - % Change in Per Student Spending')
.set_table_styles([
{'selector': 'caption',
'props': [('white-space', 'nowrap'), ('text-align', 'left'), ('font-size', '16px')]}
])
.hide()
.format({'Percent Change': '{:,.1%}'.format}, precision=2)
)
display_html(fiscal_2003_top_styler._repr_html_()+' '+' '+' '+fiscal_2003_bottom_styler._repr_html_()+'<br>', raw=True)
| State | Percent Change |
|---|---|
| Vermont | 49.8% |
| North Dakota | 48.1% |
| Washington | 45.3% |
| New York | 41.8% |
| Hawaii | 39.7% |
| State | Percent Change |
|---|---|
| Georgia | 4.4% |
| Nevada | 3.5% |
| Michigan | 2.0% |
| Idaho | -1.6% |
| Indiana | -5.0% |
4. Visual Comparison of Trends¶
Inspecting the top/bottom 5 tables above, besides Hawaii, no top spending growth state is a top academic improver and no bottom spending growth state is a bottom academic improver. This begins to suggest that perhaps there may not be a relationship between spending and academic success.
In the graphic below, we get some more clues that the relationship is tenuous. The big takeaway from this visualization is that no matter what settings are used for time span or subject, the two maps don't look similar. That is to say, it doesn't appear that states that increased spending more have increased their test scores.
# Queries obtaining data to generate map viz of NAEP and spending % change.
#####################
### NAEP % CHANGE ###
#####################
cursor.execute('''
SELECT
end_year,
jurisdiction,
math_read as subject,
mean as avg_naep
FROM
naep
WHERE
accommodations = 'R3'
AND grade = 8
AND jurisdiction NOT LIKE 'X%'
AND jurisdiction NOT IN ('NT', 'NP', 'NR', 'AS', 'GU', 'VI', 'YA',
'PR', 'DS', 'NL', 'DC')
;
''')
naep_map = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
naep_map_pivot = naep_map.pivot(index=['jurisdiction', 'subject'],
columns='end_year',
values='avg_naep')
naep_map_change = (
naep_map_pivot[2024]
.div(naep_map_pivot[2003])
.subtract(1)
.rename('Percent Change')
)
naep_map_change.to_csv('naep_change.csv')
#########################
### Spending % CHANGE ###
#########################
cursor.execute('''
SELECT
fiscal.end_year,
fiscal.te11 / membership.student_count as exp_per_stu,
fiscal.stabr
FROM
fiscal
INNER JOIN
membership
ON fiscal.end_year = membership.end_year
AND fiscal.fipst = membership.fipst
WHERE
membership.total_indicator = 'Derived - Education Unit Total minus Adult Education Count'
AND fiscal.stabr NOT IN ('GU', 'VI','AS', 'PR', 'DC', 'MP')
;
''')
fiscal_map = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
# Adjust spending for inflation
fiscal_map['exp_ia'] = fiscal_map['exp_per_stu'] * fiscal_map['end_year'].map(cpi_data)
fiscal_map = fiscal_map.drop(columns=['exp_per_stu'])
fiscal_map_pivot = fiscal_map.pivot(columns=['end_year'], index=['stabr'], values='exp_ia')
fiscal_map_change = pd.DataFrame({
'Since 1987': (fiscal_map_pivot[2022].div(fiscal_map_pivot[1987]) - 1),
'Since 2003': (fiscal_map_pivot[2022].div(fiscal_map_pivot[2003]) - 1),
})
(
fiscal_map_change
.reset_index()
.melt(id_vars='stabr',
value_vars=['Since 1987', 'Since 2003'],
value_name='Percent Change in Spending',
var_name='Timespan')
.to_csv('fiscal_change.csv', index=False)
)
%%html
<div class='tableauPlaceholder' id='viz1744061276835' style='position: relative'><noscript><a href='#'><img alt='Dashboard 1 ' src='https://public.tableau.com/static/images/Ma/Mapsofedchanges/Dashboard1/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='Mapsofedchanges/Dashboard1' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/Ma/Mapsofedchanges/Dashboard1/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1744061276835'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.minWidth='420px';vizElement.style.maxWidth='900px';vizElement.style.width='100%';vizElement.style.minHeight='587px';vizElement.style.maxHeight='1127px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.minWidth='420px';vizElement.style.maxWidth='900px';vizElement.style.width='100%';vizElement.style.minHeight='587px';vizElement.style.maxHeight='1127px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='1027px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
<br>
For the curious reader, the following combo-graph can be used to inspect spending and NAEP score changes for each individual state:
# Table generated to display line graphs together and to place null values
# where appropriate to make graphs pretty.
cursor.execute('''
SELECT
fiscal.end_year,
SUM(fiscal.te11) / SUM(membership.student_count) as avg_exp_per_stu,
fiscal.stabr as jurisdiction
FROM
fiscal
INNER JOIN
membership
ON
fiscal.end_year = membership.end_year
AND
fiscal.fipst = membership.fipst
WHERE
membership.total_indicator = 'Derived - Education Unit Total minus Adult Education Count'
GROUP BY
fiscal.end_year, fiscal.fipst
;
''')
fiscal = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
# Calculate national expenditures per student (coded as NT)
cursor.execute('''
SELECT
fiscal.end_year,
SUM(fiscal.te11) / SUM(membership.student_count) as avg_exp_per_stu,
'NT' as jurisdiction
FROM
fiscal
INNER JOIN
membership
ON
fiscal.end_year = membership.end_year
AND
fiscal.fipst = membership.fipst
WHERE
membership.total_indicator = 'Derived - Education Unit Total minus Adult Education Count'
GROUP BY
fiscal.end_year
;
''')
fiscal_nt = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
fiscal = pd.concat([fiscal, fiscal_nt])
# Adjust for inflation
fiscal['exp_per_stu_ia'] = fiscal['avg_exp_per_stu'] * fiscal['end_year'].map(cpi_data)
fiscal = fiscal.drop(columns=['avg_exp_per_stu'])
# Obtain table of NAEP Scores.
cursor.execute('''
SELECT
end_year,
jurisdiction,
math_read as subject,
accommodations,
mean as avg_naep
FROM
naep
WHERE
grade = 8
AND jurisdiction NOT LIKE 'X%'
AND jurisdiction NOT IN ('NP', 'NR', 'AS', 'GU', 'VI', 'YA',
'PR', 'DS', 'NL', 'DC')
;
''')
naep = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
filters = [range(1987, 2025),
naep['jurisdiction'].unique(),
['R2', 'R3'],
['MAT', 'RED']]
filter_product = pd.DataFrame(list(product(*filters)),
columns=['end_year', 'jurisdiction', 'accommodations', 'subject'])
# filter_product
naep_exp_with_null = (
filter_product
.merge(fiscal,
on=['end_year', 'jurisdiction'],
how='left')
.merge(naep,
on=['end_year', 'jurisdiction', 'subject', 'accommodations'],
how='left')
)
naep_exp_with_null['jurisdiction'] = (
naep_exp_with_null['jurisdiction']
.replace({'NT': 'National Average'})
)
naep_exp_with_null.to_csv('naep_exp_with_null.csv', index=False)
%%html
<div class='tableauPlaceholder' id='viz1744057974228' style='position: relative'><noscript><a href='#'><img alt='Dashboard 2 ' src='https://public.tableau.com/static/images/Sp/SpendingandNAEPperstatedashboard/Dashboard2/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='SpendingandNAEPperstatedashboard/Dashboard2' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/Sp/SpendingandNAEPperstatedashboard/Dashboard2/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1744057974228'); var vizElement = divElement.getElementsByTagName('object')[0]; if ( divElement.offsetWidth > 800 ) { vizElement.style.minWidth='420px';vizElement.style.maxWidth='800px';vizElement.style.width='100%';vizElement.style.minHeight='587px';vizElement.style.maxHeight='1027px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.minWidth='420px';vizElement.style.maxWidth='800px';vizElement.style.width='100%';vizElement.style.minHeight='587px';vizElement.style.maxHeight='1027px';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';} else { vizElement.style.width='100%';vizElement.style.height='827px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
<br>
5. Models¶
The visualizations above offer some clues, suggesting that there may not be a terribly strong relationship between per-student spending and NAEP scores. To further inspect the data, let's fit some linear models to see if there's any evidence of a relationship that can't be seen in visualizations.
# Query to generate table used for models.
cursor.execute('''
WITH fiscal_cte as (
SELECT
fiscal.end_year,
fiscal.te11 as total_exp,
membership.student_count as total_mem,
SUM(fiscal.te11) / SUM(membership.student_count) as exp_per_stu,
fiscal.fipst,
fiscal.stabr
FROM fiscal
INNER JOIN membership
ON fiscal.end_year = membership.end_year
AND fiscal.fipst = membership.fipst
WHERE membership.total_indicator = 'Derived - Education Unit Total minus Adult Education Count'
GROUP BY fiscal.end_year, fiscal.fipst
),
naep_cte as (
SELECT
end_year,
math_read,
mean as avg_naep,
accommodations,
jurisdiction
FROM
naep
WHERE
grade = 8
)
SELECT
COALESCE(naep_cte.end_year, fiscal_cte.end_year) as end_year,
COALESCE(naep_cte.jurisdiction, fiscal_cte.stabr) as state,
naep_cte.math_read,
naep_cte.avg_naep,
naep_cte.accommodations,
fiscal_cte.exp_per_stu
--fiscal_cte.total_exp,
--fiscal_cte.total_mem
FROM fiscal_cte
FULL JOIN naep_cte
ON fiscal_cte.end_year = naep_cte.end_year
AND fiscal_cte.stabr = naep_cte.jurisdiction
WHERE
naep_cte.jurisdiction NOT LIKE 'X%'
AND naep_cte.jurisdiction NOT IN ('NT', 'NP', 'NR', 'AS', 'GU', 'VI', 'YA',
'PR', 'DS', 'NL', 'DC')
;
''')
df = pd.DataFrame(cursor.fetchall(),
columns=[column[0] for column in cursor.description])
# Adjust for inflation
df['exp_per_stu_ia'] = (
df['exp_per_stu'] * df['end_year'].map(cpi_data)
)
df = df.drop(columns=['exp_per_stu'])
# Create z-scores for future plotting/analysis
df['z_exp_per_stu_ia'] = df.groupby('state')['avg_naep'].transform(lambda x: (x - x.mean()) / x.std())
df.to_csv('naep_fiscal_total.csv', index=False)
First, let's use a simple linear model, notated as: $$ NAEPscore_{i,j} \sim spending_{i,j} + accom_{i,j} + subject_{i,j} $$ where $i = state$ and $j=year$.
# VERY SIMPLE OLS MODEL
# Reduce scale of expenditure per student.
df['exp_per_stu_ia_scaled'] = df['exp_per_stu_ia'] / df['exp_per_stu_ia'].mean()
model = smf.ols('avg_naep ~ exp_per_stu_ia_scaled + math_read + accommodations', data=df).fit()
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: avg_naep R-squared: 0.579
Model: OLS Adj. R-squared: 0.578
Method: Least Squares F-statistic: 597.8
Date: Sat, 12 Apr 2025 Prob (F-statistic): 2.10e-244
Time: 09:30:32 Log-Likelihood: -4403.2
No. Observations: 1309 AIC: 8814.
Df Residuals: 1305 BIC: 8835.
Df Model: 3
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------
Intercept 262.9744 0.789 333.209 0.000 261.426 264.523
math_read[T.RED] -15.8036 0.399 -39.595 0.000 -16.587 -15.021
accommodations[T.R3] 6.5232 0.595 10.970 0.000 5.357 7.690
exp_per_stu_ia_scaled 9.7450 0.766 12.719 0.000 8.242 11.248
==============================================================================
Omnibus: 23.987 Durbin-Watson: 1.259
Prob(Omnibus): 0.000 Jarque-Bera (JB): 22.914
Skew: -0.286 Prob(JB): 1.06e-05
Kurtosis: 2.697 Cond. No. 9.33
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
This shows that there is a positive relationship between spending and NAEP scores: the estimated coefficient of the spending per student term, exp_per_stu_ia_scaled1, is positive with p-value $<.0001$. And this tracks with the following scatter plots. You can see that at every combination of subject and accommodation there's a positive relationship between per-student spending and NAEP scores.
%%html
<div class='tableauPlaceholder' id='viz1744151814181' style='position: relative'><noscript><a href='#'><img alt='NAEP vs Spending- across subject and accommodations ' src='https://public.tableau.com/static/images/NA/NAEPvsSpendingpanel/NAEPvsSpending/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='NAEPvsSpendingpanel/NAEPvsSpending' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/NA/NAEPvsSpendingpanel/NAEPvsSpending/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1744151814181'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
<br>
But things may not be as they seem! This does not account for the differences between states. Let's standardize NAEP scores for each state then plot those against spending per student.
%%html
<div class='tableauPlaceholder' id='viz1744151649235' style='position: relative'><noscript><a href='#'><img alt='Standardized NAEP vs Spending- across subject and accommodations ' src='https://public.tableau.com/static/images/St/StandardizedNAEPvsSpendingpanel/StandardizedNAEPvsSpending/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='StandardizedNAEPvsSpendingpanel/StandardizedNAEPvsSpending' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/St/StandardizedNAEPvsSpendingpanel/StandardizedNAEPvsSpending/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1744151649235'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
<br>
Now the positive relationship seems to have greatly diminished, if not all but disappeared. That is, after accounting for differences between states, there is no longer an obvious positive association between per-student spending and NAEP scores. Let's add a term for the effect of state (a fixed effect for state) to the model above:
$$ NAEPscore_{i,j} \sim spending_{i,j} + accom_{i,j} + subject_{i,j} + F_{i} $$where $i = state$, $j=year$, and $F_{i}$ is a fixed effect for state.
# MODEL WITH STATE AS FIXED EFFECT
# One could reason that the fixed effect model should be used for no other reason than
# the states don't come from a distribution; they are the entire population!
from linearmodels.panel import PanelOLS
# Convert to panel data format
df = df.set_index(["state", "end_year"])
# Fixed effect model
model_fe = PanelOLS.from_formula("avg_naep ~ exp_per_stu_ia_scaled + math_read + accommodations + EntityEffects", df.dropna()).fit()
print(model_fe.summary)
PanelOLS Estimation Summary
================================================================================
Dep. Variable: avg_naep R-squared: 0.7509
Estimator: PanelOLS R-squared (Between): 0.1655
No. Observations: 1309 R-squared (Within): 0.7509
Date: Sat, Apr 12 2025 R-squared (Overall): 0.5639
Time: 09:31:24 Log-likelihood -3808.0
Cov. Estimator: Unadjusted
F-statistic: 1262.3
Entities: 50 P-value 0.0000
Avg Obs: 26.180 Distribution: F(3,1256)
Min Obs: 20.000
Max Obs: 28.000 F-statistic (robust): 3.007e+04
P-value 0.0000
Time periods: 16 Distribution: F(3,1256)
Avg Obs: 81.812
Min Obs: 37.000
Max Obs: 100.000
Parameter Estimates
=========================================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
-----------------------------------------------------------------------------------------
exp_per_stu_ia_scaled 4.6120 1.1104 4.1535 0.0000 2.4336 6.7905
math_read[MAT] 267.47 0.9412 284.19 0.0000 265.63 269.32
math_read[RED] 251.78 0.9596 262.39 0.0000 249.90 253.67
accommodations[T.R3] 7.2039 0.4552 15.824 0.0000 6.3107 8.0970
=========================================================================================
F-test for Poolability: 38.011
P-value: 0.0000
Distribution: F(49,1256)
Included effects: Entity
The term for spending per student is still statistically significant, however the parameter estimate has decreased greatly (from 9.745 to 4.6120) and the T-stat has also decreased a lot (12.719 to 4.1535). Furthermore, the R-squared has gone up a great deal (0.579 to 0.7509) indicating that a large portion of the change in NAEP scores is explained by the fixed effect for state.
Also, notice what happens when the term for per-student spending is removed: R-squared barely goes down.
# Model with no effect for spending.
model_fe_no_spending = PanelOLS.from_formula("avg_naep ~ math_read + accommodations + EntityEffects", df).fit()
print(model_fe_no_spending.summary)
PanelOLS Estimation Summary
================================================================================
Dep. Variable: avg_naep R-squared: 0.7205
Estimator: PanelOLS R-squared (Between): 0.0505
No. Observations: 1409 R-squared (Within): 0.7205
Date: Sat, Apr 12 2025 R-squared (Overall): 0.5173
Time: 09:31:40 Log-likelihood -4202.5
Cov. Estimator: Unadjusted
F-statistic: 1748.6
Entities: 50 P-value 0.0000
Avg Obs: 28.180 Distribution: F(2,1357)
Min Obs: 22.000
Max Obs: 30.000 F-statistic (robust): 2.92e+05
P-value 0.0000
Time periods: 17 Distribution: F(2,1357)
Avg Obs: 82.882
Min Obs: 37.000
Max Obs: 100.000
Parameter Estimates
========================================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
----------------------------------------------------------------------------------------
math_read[MAT] 271.12 0.3555 762.66 0.0000 270.43 271.82
math_read[RED] 255.44 0.4138 617.23 0.0000 254.63 256.25
accommodations[T.R3] 7.7738 0.3894 19.964 0.0000 7.0099 8.5377
========================================================================================
F-test for Poolability: 40.793
P-value: 0.0000
Distribution: F(49,1357)
Included effects: Entity
This would suggest that spending per student, while statistically significant in explaining changes in NAEP scores, is not practically significant.
Moreover, if we fit a model for only the reading NAEP scores, we can say that spending doesn't have anything to do with reading scores (p-value = 0.4825). But this was already obvious from the above visualizations.
df_read_r3 = df.loc[(df['math_read'] == 'RED')]
fdsa = PanelOLS.from_formula("avg_naep ~ exp_per_stu_ia_scaled + accommodations + EntityEffects", df_read_r3).fit()
print(fdsa.summary)
PanelOLS Estimation Summary
================================================================================
Dep. Variable: avg_naep R-squared: 0.0050
Estimator: PanelOLS R-squared (Between): -0.0305
No. Observations: 613 R-squared (Within): 0.0050
Date: Sat, Apr 12 2025 R-squared (Overall): -0.0207
Time: 09:31:46 Log-likelihood -1538.7
Cov. Estimator: Unadjusted
F-statistic: 1.4087
Entities: 50 P-value 0.2453
Avg Obs: 12.260 Distribution: F(2,561)
Min Obs: 10.0000
Max Obs: 13.000 F-statistic (robust): 3.096e+04
P-value 0.0000
Time periods: 12 Distribution: F(2,561)
Avg Obs: 51.083
Min Obs: 41.000
Max Obs: 72.000
Parameter Estimates
=========================================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
-----------------------------------------------------------------------------------------
exp_per_stu_ia_scaled -0.7820 1.1129 -0.7027 0.4825 -2.9679 1.4039
accommodations[R2] 263.11 1.0654 246.95 0.0000 261.02 265.20
accommodations[R3] 264.09 1.1533 228.99 0.0000 261.82 266.35
=========================================================================================
F-test for Poolability: 28.322
P-value: 0.0000
Distribution: F(49,561)
Included effects: Entity
For the curious, here is the results for the model fit over only the math data:
df_math_r3 = df.loc[(df['math_read'] == 'MAT')]
rewq = PanelOLS.from_formula("avg_naep ~ exp_per_stu_ia_scaled + accommodations + EntityEffects", df_math_r3).fit()
print(rewq.summary)
PanelOLS Estimation Summary
================================================================================
Dep. Variable: avg_naep R-squared: 0.4736
Estimator: PanelOLS R-squared (Between): 0.1861
No. Observations: 696 R-squared (Within): 0.4736
Date: Sat, Apr 12 2025 R-squared (Overall): 0.3130
Time: 09:32:00 Log-likelihood -2054.2
Cov. Estimator: Unadjusted
F-statistic: 289.72
Entities: 50 P-value 0.0000
Avg Obs: 13.920 Distribution: F(2,644)
Min Obs: 10.0000
Max Obs: 15.000 F-statistic (robust): 2.335e+04
P-value 0.0000
Time periods: 14 Distribution: F(2,644)
Avg Obs: 49.714
Min Obs: 37.000
Max Obs: 78.000
Parameter Estimates
=========================================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
-----------------------------------------------------------------------------------------
exp_per_stu_ia_scaled 8.2023 1.6415 4.9968 0.0000 4.9789 11.426
accommodations[R2] 263.15 1.3447 195.69 0.0000 260.50 265.79
accommodations[R3] 271.41 1.7230 157.52 0.0000 268.02 274.79
=========================================================================================
F-test for Poolability: 24.583
P-value: 0.0000
Distribution: F(49,644)
Included effects: Entity
6. Conclusion¶
It was pretty clearly from the visualizations that there wasn't an obvious relationship between per-student spending and NAEP scores. And that was corroborated with the models that were fit. It appears that spending per student isn't greatly correlated with math success and not at all with reading, at least measured by NAEP scores. Personally, I find the history and current trajectory of NAEP reading scores the most alarming finding.
Notes¶
1 Perhaps you're wondering why the notation for the spending term has the suffix _scaled. Originially, the ols function from statsmodels complained that "The condition number is large, 7.07e+04. This might indicate that there are strong multicollinearity or other numerical problems." I scaled the inflation adjusted per-student spending data by dividing by the mean to shut it up.